Skip to content

cuda.core: Cythonize GraphBuilder and Graph with handle-layer cleanup#2008

Draft
Andy-Jost wants to merge 6 commits intoNVIDIA:mainfrom
Andy-Jost:graph-builder-refactor
Draft

cuda.core: Cythonize GraphBuilder and Graph with handle-layer cleanup#2008
Andy-Jost wants to merge 6 commits intoNVIDIA:mainfrom
Andy-Jost:graph-builder-refactor

Conversation

@Andy-Jost
Copy link
Copy Markdown
Contributor

@Andy-Jost Andy-Jost commented May 1, 2026

Summary

Convert GraphBuilder and Graph from Python classes (using _MembersNeededForFinalize + weakref.finalize) to Cython cdef class objects backed by typed C++ resource handles.

This does two things. First, it lays groundwork for step 3 of #1330 (graph updates) by giving graph objects the same handle-based ownership pattern as the rest of cuda.core. Second, it clarifies GraphBuilder's state machine: what used to be a tangle of implicit flags and conditional cleanup paths is now two orthogonal enums — _BuilderKind (PRIMARY/FORKED/CONDITIONAL_BODY) describing how the builder was created, and _CaptureState (CAPTURE_NOT_STARTED/CAPTURING/CAPTURE_ENDED) tracking the capture lifecycle. Methods can now check exactly the state they care about, illegal transitions are detectable, and __dealloc__ has a single, well-defined condition for ending capture.

Changes

  • Add GraphExecHandle to the resource-handle layer (_cpp/resource_handles.{hpp,cpp}, _resource_handles.{pxd,pyx}), wrapping CUgraphExec with a cuGraphExecDestroy-based deleter run under GILReleaseGuard.
  • GraphBuilder becomes a cdef class with the explicit _BuilderKind/_CaptureState enums described above. Live-API methods (begin_building, end_building, embed, split, join, etc.) move to nogil cydriver paths where practical, and end-of-capture in __dealloc__ runs against the cached StreamHandle rather than reaching into a possibly-cleared Stream attribute.
  • Graph becomes a cdef class holding GraphExecHandle _h_graph_exec directly; update/upload/launch move to nogil cydriver. weakref.finalize is gone.
  • Device.create_graph_builder and Stream.create_graph_builder cimport GraphBuilder and call its _init factory; quoted forward-reference annotations are removed (clears Cython "Strings should no longer be used for type declarations" warnings).

Related work

Andy-Jost added 3 commits May 1, 2026 14:52
…hine

Refactor GraphBuilder from a Python class using _MembersNeededForFinalize
to a cdef class with explicit _BuilderKind (PRIMARY/FORKED/CONDITIONAL_BODY)
and _CaptureState (NOT_STARTED/CAPTURING/ENDED) tracking. Cleanup moves
into __dealloc__/close, and the builder now uses GraphHandle/StreamHandle
from _resource_handles instead of holding raw driver objects. Drop the
is_stream_owner flag now that StreamHandle owns the lifetime.

End-capture paths in __dealloc__ and close guard on _h_stream so cleanup
is safe even if _init* fails before completing assignment.

Made-with: Cursor
Add a GraphExecHandle to the resource-handle layer (parallel to
GraphHandle) wrapping CUgraphExec with RAII cleanup via
cuGraphExecDestroy on shared_ptr release. Convert Graph from a Python
class using _MembersNeededForFinalize to a cdef class holding a typed
_h_graph_exec attribute, dropping the weakref.finalize machinery.
update/upload/launch move to nogil cydriver paths consistent with the
GraphBuilder rewrite.

Also drop quoted forward-reference annotations on create_graph_builder
and _instantiate_graph/complete now that GraphBuilder is cimported in
_device.pyx and _stream.pyx and Cython accepts the in-module forward
reference to Graph. Clears the related "Strings should no longer be
used for type declarations" warnings.

Made-with: Cursor
The cdef-class member declarations live in the .pxd, so the .pyx does
not need to re-cimport GraphExecHandle, GraphHandle, or StreamHandle.

Made-with: Cursor
@Andy-Jost Andy-Jost added this to the cuda.core v1.0.0 milestone May 1, 2026
@Andy-Jost Andy-Jost added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels May 1, 2026
@Andy-Jost Andy-Jost self-assigned this May 1, 2026
… cycle

cimport-ing GraphBuilder at the top of _stream.pyx and _device.pyx made
Cython emit a Python-level import of cuda.core.graph._graph_builder
during _stream module init. That triggered the chain
graph -> _graph_node -> _kernel_arg_handler -> _memory._buffer
-> _device, which then re-entered the still-initializing _stream module
via "from cuda.core._stream import IsStreamT", failing with
ImportError: cannot import name IsStreamT.

Restore the original lazy "import GraphBuilder" inside
create_graph_builder (Stream and Device) and Stream_accept. The return
annotations stay as bare names; "from __future__ import annotations" in
both files defers their evaluation, so they need not resolve at
function-definition time.

Made-with: Cursor
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

@Andy-Jost Andy-Jost marked this pull request as draft May 4, 2026 17:12
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot Bot commented May 4, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@Andy-Jost
Copy link
Copy Markdown
Contributor Author

/ok to test

The previous import-cycle fix changed _stream/_device.create_graph_builder
to a lazy Python "import GraphBuilder" instead of a module-level cimport.
With _init declared as @staticmethod cdef, Python attribute lookup
cannot find it, so every test that builds a graph failed with
"AttributeError: type object 'GraphBuilder' has no attribute '_init'"
at _device.pyx:1376 / _stream.pyx:376.

Convert _init from @staticmethod cdef to @staticmethod def (matches the
Stream._init pattern) and drop the cdef declaration from the .pxd.
_init runs once per builder creation, so the loss of cdef-level
dispatch is irrelevant. Graph._init stays cdef; it is only called
intra-module.

Made-with: Cursor
@Andy-Jost
Copy link
Copy Markdown
Contributor Author

/ok to test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant